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This  report  summarises  research  efforts  on  (1)  the  deraonsCratlon  of  holographic 
degeneracies  in  associative  memories,  (2)  the  procedure  for  designing  fractal  grids 
for  planar  holograms,  (3)  the  experimental  demonstration  of  one  and  two  layer  neural 
networks  that  are  designed  with  such  fractal  sampling  grids,  (4)  the  experimental 
demonat-atlon  of  dynamic  holographic  memorlea  that  are  capable  of  an  arbitrarily 
long  sequence  of  adaptations,  (5)  the  optical  Implementation  of  the  Kanerva'a  network 
for  hand-written  character  recognition,  (6)  the  development  of  an  antl-Hebbian 
local  learning  algorithm  for  training  mult-layer  neural  networks,  (7)  the  experimental 
demonstration  of  optical  radial  basis  function  network,  and  (8)  the  demonstration 
of  a  two-layer  local-representation  optical  network  for  reel-time  face  recognition. 
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This  is  the  finad  report  fjr  the  AFOSR  sponsored  program  “Dense  Modifiable  In¬ 
terconnections  Utilizing  Photorefractive  Volume  Holograms”  ( AFOSR-89-0045).  Here  we 
present  an  overview  of  the  accomplishments  which  have  been  described  in  more  detail  m 
the  semi-annuail  reports  as  well  as  the  published  papers  that  are  listed  at  the  end  of  this 
report. 

The  basic  architecture  for  holographic  optical  neural  networks  [1,2]  is  shown  in  Fig. 
1,  which  can  be  a  single  stage  of  a  multilayered  system.  Neurons  in  one  plane  are  intercon¬ 
nected  with  the  neurons  in  other  planes  via  holographic  gratings  stored  in  light  sensitive 
media  such  as  photorefractive  crystals.  Specifically,  the  light  from  a  pixel  at  the  input 
neural  plane  is  collimated  and  diffracted  by  a  holographic  grating.  The  diffracted  light 
is  then  focused  by  a  lens  onto  a  pixel  at  the  output  neural  plane.  The  goal  is  to  train 
such  a  system,  through  modifications  of  the  holographic  interconnections,  in  order  for  it  to 
perform  desirable  computations.  The  success  of  this  learning  procedure  depends  critically 
upon  the  optical  hau-dware  and  the  learning  adgorithm.  It  has  been  the  objective  of  this 
resetu-ch  program  to  realize  dense  modifiable  interconnections  in  such  adaptive  systems 
using  photorefractive  volume  holograms. 

A  basic  geometrical  limitation  on  the  density  of  interconnections  achievable  through 
volume  holograms  is  due  to  the  finite  volume  of  photorefractive  crystals.  Let  iV  be  the 
number  of  resolvable  points  in  any  one  dimension  for  both  the  neural  pltmes  and  the 
hologram.  There  are  pixels  in  both  the  input  and  output  planes.  On  the  other  hand, 
the  total  number  of  gratings  available  in  the  hologram  is  iV*.  Therefore,  if  we  want  to 
interconnect  independently  each  of  the  input  neurons  to  all  the  output  nemons,  only  iV®/* 
pixels  from  the  iV*  available  sites  at  each  plwe  can  be  selected  for  the  placement  of 
neurons.  The  sampling  procedure  [3,4j  is  described  as  follows.  Each  time  we  attempt  to 
add  a  neuron  to  a  new  site  at  the  input  (output),  we  check  to  see  whether  this  new  site 
is  already  conne''ted  to  one  of  the  nexirons  selected  previotisly  at  the  output  (input)  by 
an  existing  grating.  If  it  is,  we  eliminate  this  site  from  the  sampling  grid;  if  it  is  not,  we 
place  a  neuron  at  this  site,  which  implies  that  gratings  are  established  to  connect  this  new 
neuron  to  all  the  neurons  now  selected  at  the  output  (input).  By  iterating  this  procedure, 
we  can  find  sets  of  fractal  sampling  grids  that  must  be  used  in  the  input  and  output  planes 
so  as  to  guarantee  that  all  the  interconnections  between  the  input  and  output  planes 
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can  be  independently  specified.  A  Wge  family  of  sampling  grids  were  derived  using  this 
procedure. 

To  solve  a  practically  significant  problem,  neural-net  learning  algorithms  typically  re¬ 
quire  at  least  thousands  of  iterations  (or,  modifications  of  synaptic  interconnections).  In 
the  optical  implementation  shown  in  Fig.  1,  each  iteration  requires  an  additional  holo¬ 
graphic  exposure  to  be  made  in  the  same  crystal.  Therefore,  a  very  large  number  of  holo¬ 
grams  must  be  superimposed  in  a  learning  architecture.  The  basic  problem  with  writing  a 
large  number  of  photorefractive  holograms  is  that  during  the  exposure  of  new  holograms, 
previously  recorded  holograms  decay  due  to  a  photogenerated  increase  in  the  free  carrier 
density.  a  result,  the  overall  diffraction  efficiency  becomes  inversely  proportional  to  the 
number  of  holographic  exposures  [5].  This  rapid  decrease  of  diffraction  efficiency  severely 
limits  the  extent  to  which  optical  neural  networks  can  be  trained.  One  method  to  overcome 
this  problem  is  dynamic  copying  [6].  The  basic  idea  is  to  transfer  the  multiply  exposed 
hologram  to  a  second  medium,  and  then  copy  it  back  with  a  single  exposure  to  rejuvenate 
the  primruy  hologram.  As  a  result,  the  overall  diffraction  efficiency  after  copying  becomes 
independent  of  the  munber  of  holographic  exposures  used  to  form  the  original  hologram. 
Several  variations  of  this  method  have  been  demonstrated,  including  copying  with  with 
an  all-optical  feedback  loop  (7),  with  a  pair  of  active  phase  conjugate  mirrors  [8],  between 
two  photorefractive  media,  and  with  an  optoelectronic  feedback  loop.  These  methods  al¬ 
low  us  to  construct  optical  learning  networks  capable  of  an  arbitrarily  long  sequence  of 
adaptations. 

The  types  of  learning  that  can  be  implemented  on  an  optical  network  fall  into  two 
broad  catagories  depending  on  the  characteristics  of  the  hidden  layer  representation;  learn¬ 
ing  with  distributed  representation  and  learning  with  local  representation.  In  the  dis¬ 
tributed  learni  ig,  the  network  is  trained  so  that  a  large  fraction  of  hidden  miits  is  turned 
on  for  each  input,  while  for  learning  with  local  representation,  only  one  or  a  small  munber 
of  hidden  units  are  turned  on  for  each  input. 

Examples  of  distributed  learning  networks  include  Kanerva’s  network  (9),  the  Back- 
propagation  network  (10),  the  ALL  network  (11,12),  and  the  tiling  network.  In  the  Kan¬ 
erva’s  network,  the  weights  of  the  first  layer  interconnections  are  random,  and  each  input 
is  mapped  to  a  sparse,  distributed  hidden  representation.  The  second  layer,  truaed  by 
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the  sum-of-outer*products  rule,  ■  perfonr  classification  on  the  distributed  hidden  repre¬ 
sentation.  Figure  2  shows  the  optica'  implementation  of  the  Kanerva’s  network,  which 
was  constructed  and  trtiined  for  hand-written  character  recognition.  After  training  with 
104  patterns,  all  the  training  patterns  were  recognized  correctly  by  the  system.  Figure 
3  shows  some  examples  of  the  input  patterns,  their  distributed  hidden  representations, 
the  responses  of  the  output  units.  The  position  of  the  switched-on  output  units  indicated 
which  character  in  the  alphabet  is  the  input.  The  trained  network  was  edso  tested  with 
520  handwritten  character  patterns  (20  patterns  from  each  class)  that  were  not  in  the 
training  set.  311  out  of  the  520  testing  patterns  were  correctly  classified,  giving  an  average 
recognition  rate  of  about  60%.  This  recognition  rate  is  much  better  than  random  guessing 
(4%),  but  far  below  what  is  required  for  a  useful  character  recognition  system.  The  reason 
for  the  relatively  poor  performtince  on  the  test  set  is  the  choice  of  training  algorithm  used, 
specifically  the  fixed  first  layer  weights  and  the  limited  number  of  training  cycles  for  the 
second  layer.  This  same  system  can  be  used  to  implement  algorithms  in  which  both  layers 
are  fully  trained  with  error  driven  algorithms  such  as  Backpropagation  and  ALL,  which, 
in  computer  simulations,  give  much  better  performance. 

Learning  in  both  the  Backpropagation  network  and  the  ALL  network  aims  at  reduc¬ 
ing  the  output  error  at  each  iteration.  "While  the  former  requires  that  the  output  error  be 
propagated  backwards  through  the  network,  the  latter  does  not.  Although  this  advantage 
comes  at  a  price  of  relatively  slow  learning  rate,  the  simplicity  of  the  ALL  network  makes 
it  very  attractive  for  optical  implementation  in  the  near  future.  Both  the  Backpropagation 
and  ALL  networks  maintain  a  fixed  size  during  the  training  and  only  the  interconnec¬ 
tions  are  modified.  The  tiling  network,  however,  does  not  have  a  fixed  size.  Rather,  it 
grows  during  training.  Specifically,  the  network  starts  with  a  single  hidden  unit  that  is 
trained  with  the  perception  algorithm,  and  more  hidden  units  are  added  only  when  they 
are  needed.  Compared  with  local  learning,  distributed  learning  can  generally  yield  small 
networks  that  can  generalize  well  from  a  relatively  small  »t  of  examples.  However,  these 
networks  are  very  difficult  to  train. 

Local  representation  net\<rorks  axe  relatively  easy  to  trun,  but  they  usually  require  a 
relatively  large  size  and  large  number  of  training  samples.  A  typical  example  is  the  Radial 
Basis  Function  (RBF)  network.  The  first  layer  of  the  network  is  trained  to  generate  an 
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array  of  basis  function  centers.  When,  an  input  is  given,  the  network  will  calculate  the 
distance  between  the  input  auid  the  centers,  and  a  hidden  imit  will  be  turned  on  w’  ;n 
the  input  is  close  to  the  corresponding  center.  The  output  response  is  a  weighted  sum  the 
hidden  layer  response.  Optical  RBF  networks  have  been  constructed  using  optical  memory 
disks  [13]  and  spatied  multiplexing  parallel  architecture  [14],  and  have  been  successfully 
trained  for  hand  written  character  recognition. 

Recently,  a  two-layer  local-representation  optical  network  has  been  constructed  and 
trained  to  recognize  in  real  time  “Denk”,  who  is  a  student  in  our  group.  The  network  is 
implemented  with  liquid  crystal  spatial  light  modulators  for  the  neural  planes  and  lithium 
niobate  photorefractive  crystals  for  the  interconnections.  The  network  has  approximately 
60,000  units  at  the  input  plane,  30  hidden  units,  and  a  single  output  unit.  The  output  unit 
is  turned  on  whenever  the  network  classifies  the  input  as  Denk.  The  network  is  trained 
with  a  video  segment  2  minutes  long,  from  which  180  frames  were  selected  for  training. 
Specifically,  each  hidden  unit  was  trained  to  respond  to  6  frames.  The  trained  network 
classified  the  rest  of  the  training  tape  almost  flawlessly.  The  system  was  then  tested  by 
presenting  through  a  T\'  camera  real  time  input  of  Denk  and  other  members  of  our  group. 
The  system  almost  never  misclassified  other  people  as  Denk  and  exhibited  remarkable 
tolerance  to  changes  in  aspect,  illumination,  and  facial  expression. 


Figure  1.  Basic  architecture  for  holographic  optical  neural  networks. 
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Figure  2.  Optical  implementation  of  the  Kanerva’s  network.  VM  =  video  monitor,  LCLV  = 
liquid  crystal  light  valve,  PR  =  photoiefractive  crystal,  (P)BS  =  (polarizing)  beam 
splitter,  RM  =  rotating  mirror,  L  =  lens,  S  =  shutter. 


' 

A 

"?  ■ 

^  ■ 

f  ■ 

•a 

*  f,  ,»•  vl  • 

•  'i*  v  •  ■ 

i  • 

# 

f 

#  Figure  3.  Examples  of  the  signals  at  the  input  (top),  hidden  (middle),  and  output  (bottom) 
layers  in  the  experimental  two-layer  network. 
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